home *** CD-ROM | disk | FTP | other *** search
- Short: Arexx script to collect html documents. v0.6
- Author: Arne Seime <aseime@iname.com>
- Uploader: Arne Seime <aseime@iname.com>
- Type: comm/www
- Requires: HTTPJ (comm/www/HTTPJ200.lha), rexxsupport.library
- Version: 0.4
- Replaces: AutoPage0.4.lha
-
- The script is freeware, but feel free to send me an email if you use it or
- has bug reports/suggestions.
-
- Idea: Cut the phonebill costs.
- Result: Simple and probably buggy arexx script to collect html pages. Works for me.
-
- You probably have a lot of html pages you check every time you are
- online to see if there has been any changes. I do. The script uses
- HTTPJ to check for updates/changes, and if found, it gets the page.
- Results are presented in a html page.
-
- Installation:
- Get hold of HTTPJ and place the executable in the same directory as
- AutoPage.rexx, httpj.rexx and sitelist.txt. rexxsupport.library (in lowercase)
- should be placed in sys:libs. Autopage.prefs should be in ENV: (And ENVARC:
- ofcourse)
-
- Configuration:
- From now on, I've included a prefs file. Should look like this:
-
- [Chopped rigth from the script]
- Say "<savedir> /* Directory to save pages in*/"
- Say "<progdir> /* Directory where program files are located */"
- Say "<connections> /* Number of connections to run at the same time */"
- Say "<loop delay> /* Time in 1/50 sec. Time to wait for ready connection */"
- Say "<buffers> /* Download buffer in kb each connection */"
-
- Add/remove sites in sitelist.txt as you want to with an editor. The format should
- be like this:
-
- URL IMAGES SHOW
-
- URL: The address. Dont forget to remove the protocol ("http://").
- IMAGES: Get images as well as the html page. 0 means no, 1 yes.
- SHOW: Present the result. Useful to turn off when the page is a part
- of a frameset. This is because HTTPJ don't seem to handle frames at all.
- Example: The IBrowse support page will be like this:
-
- www.omnipresence.com/ibrowse/index.html 1 1
- www.omnipresence.com/ibrowse/menue_f.html 1 0
- www.omnipresence.com/ibrowse/home_f.html 1 0
-
- This will get the whole thing, but not bother you with two extra items on the
- result html page.
-
- Also be aware that some servers always present their pages as "new" ones, and
- therefore HTTPJ get them even if they really are the same as last time you checked.
-
- Another problem I've come across is that HTTPJ on certain pages gets images even
- when I tell it not to. I think this is a bug in HTTPJ, and I've tried to get in
- touch with the author Piergiorgio Ghezzo, but with negative result. If anyone knows
- how to get in touch with him via email, please mail me the address.
-
- Future:
- Make it work from IBrowse.
- Add direct link to the original page.
- Probably a lot of bug fixes. [Done. Well, at least two of them :)]
- Use browser hotlist as sitelist.
- Get more than one page at a time. [Done]
-
- History:
- Version 0.6, sencond release.
-
- [Chopped from the script]
- ** CHANGES SINCE 0.4:
- ** - Added progdir so script can be run from any path, not just current dir.
- ** - Fixed a time convertion bug that occured if a disk file was dated
- ** xx:00:xx. Strange I didn't discovered it before :)
- ** - The result page are temporary stored in ram to keep disk fragmentation down.
- ** - sitelist.txt was not closed when exiting.
- ** - Added possibility to recieve several pages at a time.
- ** - Added a prefs file - a few more options.
-
- Disclaimer: My bad "Sunday-after-a-real-though-Saturday-Night-English".
-